OntoSeek: Using Large Linguistic Ontologies for Accessing On-Line Yellow Pages and Product Catalogs
نویسنده
چکیده
To exploit effectively the mass of information available today on the Web, the key problem is that of content matching: the relevant information must be selected according to the user needs, independently of the vocabulary and the syntax used to express it. Content matching seems to be an intrinsic problem for textual documents or web pages: current information retrieval techniques either rely on an encoding process that describes a given item according to a certain perspective or classification scheme, or perform a full-text analysis based on the search for userspecified words. Neither case guarantees content matching, because an encoded description may reflect only part of the content, and the mere occurrence of a word (or even sentence) does not necessarily reflect the document’s content. For general documents, there doesn’t yet seem to be a much better option than some sort of lazy full-text analysis, leaving us to sift through endless result pages. There is however a relevant class of information repositories-online yellow pages and product catalogs-where content matching can be both feasible and crucial. In this paper~, we first analyze the peculiarities of these repositories with respect togeneric Web documents, and then we discuss the role that current linguistic ontologies like WordNet (Miller, 1995) can play to support content matching. We then present the architecture of a system called OntoSeek, specifically targeted to on-line yellow pages and product catalogs. The system is the result of a two-year cooperation between CORINTO (national research consortium for object technology, a partnership of IBM Semea, Apple Italia, and Selfin Spa) and LADSEBCNR, as part of a project on retrieval and reuse of objectoriented software components (Borgo et al., 1997). OntoSeek adopts a language of limited expressiveness for content representation, and exploits a large linguistic ontology based on WordNet (namely SENSUS, developed at ISIUSC) for content matching. In general, with respect to standard word-matching systems, expressing the content structure by means of a simple representation language increases the precision of the retrieval, while adopting a hierarchy of keywords increases both recall and precision. In OntoSeek, the use of a linguistic ontology results in two further advantages: a decoupling between the user vocabulary and the encoding terminology, and an additional increase of recall and precision due to synonymy handling and sense disambiguation. Our conclusion is that yellow pages and product catalogs constitute a strategic niche, where retrieval techniques based on simple representation capabilities and large linguistic ontologies appear to be particularly effective.
منابع مشابه
OntoSeek: Content-Based Access to the Web
PERHAPS YOU’RE AMONG THE many who’ve entered a search into a Web browser and received pages of links—only some relevant, many not? Dodging this pitfall—barring the way to the Web’s wealth of information—requires successful content matching. Current information-retrieval techniques either rely on an encoding process—using a certain perspective or classification scheme— to describe a given item, ...
متن کاملProdLight: A Lightweight Ontology for Product Description Based on Datatype Properties
Web pages representing offerings of products and services are a major source of data for Semantic Web-based e-commerce. This data could be useful for numerous applications, e.g. (1) more precise product search engines and shopping bots, (2) aggregation or enrichment of multi-vendor catalogs using public product descriptions, or (3) the automated discovery of additional alternatives based on the...
متن کاملYellow Pages on the Semantic Web
Yellow pages catalogs and corresponding directory services on the web are a widely used business concept for helping people to find companies providing services and selling products. When on the web, matching the customer’s need with the relevant services offerred by companies is typically based on keyword search, table-based search, a list of service categories listed in some order, a hierarch...
متن کاملDTL's DataSpot: Database Exploration Using Plain Language
DTL’s DataSpot is a database publishing tool that enables non-technical end users to explore a database using free-form plain language queries combined with hypertext navigation. DataSpot is based on a novel representation of data in the form of a schema-less semi-structured graph called a hyperbase. The DataSpot Publisher takes one or more possibly heterogeneous databases, predefined knowledge...
متن کاملA Multilingual Natural Language Interface for E-Commerce Applications
In this paper we present a multilingual natural language interface architecture, which can be used for accessing on line product catalogs and lets users formulate their queries in their native languages. In our interface architecture a rule based machinelearning module replaces an elaborate semantic analysis component. The learning module learns the correct mappings of a user’s input to the cor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003